AITopics | information state

Collaborating Authors

information state

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Reliability Limits of LLM-Based Multi-Agent Planning

Ao, Ruicheng, Gao, Siyang, Simchi-Levi, David

arXiv.org Machine LearningMar-31-2026

This technical note studies the reliability limits of LLM-based multi-agent planning as a delegated decision problem. We model the LLM-based multi-agent architecture as a finite acyclic decision network in which multiple stages process shared model-context information, communicate through language interfaces with limited capacity, and may invoke human review. We show that, without new exogenous signals, any delegated network is decision-theoretically dominated by a centralized Bayes decision maker with access to the same information. In the common-evidence regime, this implies that optimizing over multi-agent directed acyclic graphs under a finite communication budget can be recast as choosing a budget-constrained stochastic experiment on the shared signal. We also characterize the loss induced by communication and information compression. Under proper scoring rules, the gap between the centralized Bayes value and the value after communication admits an expected posterior divergence representation, which reduces to conditional mutual information under logarithmic loss and to expected squared posterior error under the Brier score. These results characterize the fundamental reliability limits of delegated LLM planning. Experiments with LLMs on a controlled problem set further demonstrate these characterizations.

artificial intelligence, communication, information, (16 more...)

arXiv.org Machine Learning

2603.26993

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Kosovo > District of Gjilan > Kamenica (0.04)
Asia > China > Hong Kong > Kowloon (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

71b52a5b3fe2e9303433a174b60e160d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 19:44:51 GMT

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(6 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

faa6276ea12d7afeb3e42b210c86f688-Paper-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 00:13:44 GMT

blueprint, deviation, improvised, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Oregon (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)

Add feedback

56bd37d3a2fda0f2f41925019c81011d-Paper.pdf

Neural Information Processing SystemsFeb-12-2026, 05:51:43 GMT

attacker, defender, information, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
North America > Canada (0.04)
Asia > Malaysia (0.04)

Industry:

Government > Military (0.93)
Information Technology > Security & Privacy (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Inclusive Fitness as a Key Step Towards More Advanced Social Behaviors in Multi-Agent Reinforcement Learning Settings

Rosseau, Andries, Avalos, Raphaël, Nowé, Ann

arXiv.org Artificial IntelligenceOct-15-2025

The competitive and cooperative forces of natural selection have driven the evolution of intelligence for millions of years, culminating in nature's vast biodiversity and the complexity of human minds. Inspired by this process, we propose a novel multi-agent reinforcement learning framework where each agent is assigned a genotype and where reward functions are modelled after the concept of inclusive fitness. An agent's genetic material may be shared with other agents, and our inclusive reward function naturally accounts for this. We study the resulting social dynamics in two types of network games with prisoner's dilemmas and find that our results align with well-established principles from biology, such as Hamilton's rule. Furthermore, we outline how this framework can extend to more open-ended environments with spatial and temporal structure, finite resources, and evolving populations. We hypothesize the emergence of an arms race of strategies, where each new strategy is a gradual improvement over earlier adaptations of other agents, effectively producing a multi-agent autocurriculum analogous to biological evolution. In contrast to the binary team-based structures prevalent in earlier research, our gene-based reward structure introduces a spectrum of cooperation ranging from full adversity to full cooperativeness based on genetic similarity, enabling unique non team-based social dynamics. For example, one agent having a mutual cooperative relationship with two other agents, while the two other agents behave adversarially towards each other. We argue that incorporating inclusive fitness in agents provides a foundation for the emergence of more strategically advanced and socially intelligent agents.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2510.12555

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine (0.69)
Leisure & Entertainment > Games > Computer Games (0.68)
Leisure & Entertainment > Games > Backgammon (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Periodic agent-state based Q-learning for POMDPs

Neural Information Processing SystemsOct-10-2025, 05:54:34 GMT

The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP . However, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy.

agent state, asql, markov chain, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(5 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Collaborative and Proactive Management of Task-Oriented Conversations

Saedi, Arezoo, Fatemi, Afsaneh, Nematbakhsh, Mohammad Ali, Rosset, Sophie, Vilnat, Anne

arXiv.org Artificial IntelligenceOct-8-2025

Task oriented dialogue systems (TOD) complete particular tasks based on user preferences across natural language interactions. Considering the impressive performance of large language models (LLMs) in natural language processing (NLP) tasks, most of the latest TODs are centered on LLMs. While proactive planning is crucial for task completion, many existing TODs overlook effective goal-aware planning. This paper creates a model for managing task-oriented conversations, conceptualized centered on the information state approach to dialogue management. The created model incorporated constructive intermediate information in planning. Initially, predefined slots and text part informational components are created to model user preferences. Investigating intermediate information, critical circumstances are identified. Informational components corresponding to these circumstances are created. Possible configurations for these informational components lead to limited information states. Then, dialogue moves, which indicate movement between these information states and the procedures that must be performed in the movements, are created. Eventually, the update strategy is constructed. The created model is implemented leveraging in-context learning of LLMs. In this model, database queries are created centered on indicated predefined slots and the order of retrieved entities is indicated centered on text part. This mechanism enables passing the whole corresponding entities to the preferences in the order of congruency. Evaluations exploiting the complete test conversations of MultiWOZ, with no more than a domain in a conversation, illustrate maximal inform and success, and improvement compared with previous methods.

large language model, machine learning, predefined slot, (22 more...)

arXiv.org Artificial Intelligence

2510.0511

Country: Asia (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Non-Cooperative Inverse Reinforcement Learning

Neural Information Processing SystemsOct-2-2025, 18:32:24 GMT

Making decisions in the presence of a strategic opponent requires one to take into account the opponent's ability to actively mask its intended objective. To describe such strategic situations, we introduce the non-cooperative inverse reinforcement learning (N-CIRL) formalism. The N-CIRL formalism consists of two agents with completely misaligned objectives, where only one of the agents knows the true objective function.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America (0.28)

Industry:

Government > Military (0.93)
Information Technology > Security & Privacy (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Convergence of regularized agent-state-based Q-learning in POMDPs

Sinha, Amit, Geist, Matthieu, Mahajan, Aditya

arXiv.org Artificial IntelligenceSep-4-2025

In this paper, we present a framework to understand the convergence of commonly used Q-learning reinforcement learning algorithms in practice. Two salient features of such algorithms are: (i)~the Q-table is recursively updated using an agent state (such as the state of a recurrent neural network) which is not a belief state or an information state and (ii)~policy regularization is often used to encourage exploration and stabilize the learning algorithm. We investigate the simplest form of such Q-learning algorithms which we call regularized agent-state-based Q-learning (RASQL) and show that it converges under mild technical conditions to the fixed point of an appropriately defined regularized MDP, which depends on the stationary distribution induced by the behavioral policy. We also show that a similar analysis continues to work for a variant of RASQL that learns periodic policies. We present numerical examples to illustrate that the empirical convergence behavior matches with the proposed theoretical limit.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2508.21314

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.64)

Technology: